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ABSTRACT 

The utility of computer analysis in the assessment of 
written products was studied using the WordMAP software package. Data 
were collected for 92 college freshmen, using; (1) the Test of 
Standard written English (TSWE); (2) the English Composition Test of 
the College Board; (3) vercal and mathematical scholastic Aptitude 
Tests; (4) two narrative essays; (5) two expository essays; and (6) 
two persuasive essays. The variables analyzed wordMAP were used to 
predict the score on a single essay and a combined score for the 
other five essays that three human readers would give. In either 
situation, the computer could predict the reader's score reasonably 
well. It is not likely that many institutions will choose to assess 
writing without using human readers, but the fact that assessment of 
%rriting skills can be enhanced through software analysis may make it 
possible to reduce the amount of labor required, perhaps by using 
only one reader instead of the two or three usually required, 
computer analysis also makes possible a level of feedback to students 
and teachers that is not possible using human readers alone. Five 
tables contain data from the study. (SLD) 
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Significant pjrogress has been made in recent years in the autcxnated 
analysis of written prodtets. Several software pnograias are cjoran^rcially 
available and others are in varicaas stag^ of ctevelopuent. Vfell-kmawn are 
softa^are pack^es liJce Wtiter's Wbrkbend^ (Frase, 1983; Kiefer & anith, 
1983), Grainnatik III (UiiesanByer, 1984? Sampler, Williams, W&lker, 1988), 
mm. (C3Qhen & lanham, 1984) , waNEftH (Vcai Blum & OAm\, 1984) , HHJ Writer, 
and Ric^Ttwriter, v^ich detect a nuntoer of features of style and usage— but 
v^ich have serious limitations {Bowyer, 1989; Gralla, 1988; Hazen, 1986). 
Ttiese sinple progrBTB c^jerate primarily by oounting and string watching, and 
they can often be marketed in the form of cffie to several low-density 
diskettes. Other programs under developnent use pattern matching ancVor 
parsing and are cor^eqi^ntly much more csonplex. TSiese systems include im^s 
EPISTDB (Heidom, Jensen, Miller, Eyrd, & Chodorow, 1982) , a pattern matching 
program develc^jed at the University of Pittsibur^ (Hul.l, Ball, Pox, levin, & 
McCutchen, 1985) , and WdrdM?^ (Lytle & Mathews, urdated) . Of these more 
complex systens, WorcMAP (IM) has the advantage that it has been used 
extensively in secc«aary schools, oorarajnity colleges, universities, and even 
graduate busings ^::hools. 

Ihe assessment of v^riting skill is labo3>intensi-,?e, especially if any 
atten^t is made to provide examinees with any feedback other than a single 
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scx>re. Moreover, it is often unreliable bKsause of reader disagreeioents and 
the use of one-item assessmeafits consisting of a single essay a single 
topic. Cdrnputer analysis of written prodiK-ts allows for greater detail in 
the feedback that can be provided to e^camine^ and can relieve imich of the 
ixmJen on human readers by inaking detailed judgments unnecjessary. QsiputGr 
analysis can also add to tl^ validity of assessnients macte by nailt^ale-choioe 
t^ts and ^say tests judc^ by itailtiple readers. Finally, conputer analysis 
of tvriting is not limited to evaluation of crje or two sanples of writing? as 
many sanples as are available, even lengtliy ones, can be amlyzed. 

Ito examine the utility of oonfxiter analysis in the assessment of written 
products, we have nade use of an extensive data base of writing sJcill 
assessroeaits and other assessments collected over a nuniser of yeara. We have 
augmented that data based with ccmputer analyses of the same e^ays that were 
originally scored by human readers as part of a research stixay. With such a 
laige array of variables, a nuntoer of iirportant cjuestions can be asked: Can 
ccarputers score essays as well as human re^iers? Can ccmputej^s in 
conjunction with multiple-choice scores of English skills replace readers? 
If human readers are essential, haw many ind<^>endent reading are needed v^en 
ooirputer analyses and itiultiple~c4K>ice test scores are available? To \Aiat 
degree can writing ability be predicted from a single sanple of writing? 

Data Source 

Data used for the project were originally collected and analyzed by 
Breland et al. (1987) . Ihese data osisist of Osllege Board scores 
(Scholastic Aptitude Test, English Oorposition Itest, Test of Standard Written 
English) and special essays collected for the study, Ihe essays were written 
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as a part of freshman Englifih ccnfxssiticHi cxurses in six different 
institutions in six different states. Although a total of 270 students 
ccinpleted all alignments (two narrative essays, two exposicory essays, and 
tw persuasive essays ccusjleted at home) , a subsanple of 92 stofents was used 
for the COTputer analyses. TJie follcswing variables were available for 92 
cases: 

The Ttest of Standard ittm English (TSWE) . A 3(HQdnute imatiple- 
ciioioe test administered at the time students are ^plying for college 
admi^ion. 

T he Encrli£h Ocirposition Itest fECT) . A 60-5ninute achievement test in 
English usually required Isy the itost selective institutions a«i sonodiat 
more difficult than the TSWE. For seme administrationt:, the ECT 
includes a 20-itdnute essay test and, for those cKtninistrations, the 
inultiple-cix>ioe pcartion of the test is 40 mimtes. Only the imatiple- 
choice porticw of the test was lased for the present analyses. 
The Scholastic Aotituc^ Test. Verbal Part fSAT-V) . A 60-ininute 
mltiple-choioe test of veitoal aptitude. 

Ite Scholastic Aptitude Ttest. Matheroatical Part (SAT-M) . A 60-mtnute 
mltiple-dhoice test of matJ^tratical aptitude. 
Essay #1 Score . Uie sum of three holistic scores for a 45-minute 
esqpository essay, range 3 to 18. 

Essay #2 Score . sum of three holistic scores for a 45-mnute 
expository essay on a second topic, rar^ 3 to 18. 
Essay #3 Score . Tlie sum of three holir; ic scores for a 45-jTiinute 
narrative essay, range 3 to 18. 

Essay #4 Score . Otie sum of three holistic scores for a 45-miraite 



narrative ess^ cm a second topic, range 3 to IB. 

Essay #5 Score. sum of three iK^istic scores for a peansuasive essay 

written first as a cicaft in class, discussed in a secmd class period, 

and rewritten as a take-home assignmmt. Range 3 to 18. 

E^ay #6 Score. OJie sum of three holistic scores for a second persuasive 

ess^ tcpic written in the same way as Essay 5, range 3 to 18. 

For Essay #1, the fbUcwing variables were also available: 

Error Rate. A maraaal count of ernars ^say conducted ini^)end€3itly by 

two different rea^rs with tl^ two reader counts sunroed and divided 

by the total number of wcsrds written. 

Word Qjunt . A ccaiixiter count of the nuntoer of wonte written. 
I^raora^ch Count . A connpiter count of the nurnber of paragraphs written. 
tesive Verb Flags . A oatfjuter count of the number of passive verb 
fla^s. 

lb Be Verb Flags . A conputer count of the nuniber of to be verb flags. 
SubiectyVerb Flags. A cosfxiter count of subject-verb disagreeraent 
flags. 

Rizzv Word Flags . A conputer count of fuzzy (or overused) v?ord usage 
flags. 

Run-CTi Sentence Flags . A oosDfJuter count of run-cai sentence flags. 
Dancrler Flacfs . A conpater count of danglii^ preposition fla^. 
Spellinq Flags . A oonixiter count of filing flags. 
Capitalization Flags . A ccirputer count of capitalization flags. 
Punctuatic^ Flag s. A ocsiputer count of punctmtirai flags. 



Flags Score . A coiposite score based on all fl^. 
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MarKs Score , A oonposite scjore based on the nunfiaer of faanctuatim mark 
t^ypes used. 

VfejrdMRP Oo9i83osite . A cxsi|>lex ocsDoposite of all WsndMAP variables. 
Gmnnar Flags . A ccnputer csount of all granunatical fl^s 
Usage Flats . A cawfjuter count of all usage flags. 
Style Flags . A ooaipiter count of all style flags. 

Pngdic±inq Holistic Soorina for a Sing le Essay 

•Eable 1 siicws c3orraLaticr>s betvflsen the Essay #1 score and other 
available variables. The laest correlates of this single essay score are the 
TSWE and Error Rate (both .60), followed fcy the ECT (.56), SAT-V (.54), Wbni 
C3ount (.50), Marks Score (-48), Fla^ Score (.47), and the Wort^CVP Carapositc 
Sojre (.46). Table I also shows that all of tl^se variables correlate better 
with the Essay #1 scxjie than does SATMI (.36) , but it is interesting to 
observe that even SA3VM is a useful predictor of writing skill . The surprise 
of Table 1 is that the count of passive verb fl^, style flags, and usage 
fl^s all correlate positively with the balistic score for this essay. 
;^paiently, these kinds of variables are ncft considerable iiiportant by the 
readers of these essays. 

•Table 2 shews a series of multiple regression analyses in vdiicii the 
Essay #1 score is predicted frcm selected variables. Variable Set X included 
all miltiple-choioe scores, but only two of these (TSWE and SAaW) made a 
significant cxsTtributicffi to the prediction. The shrunken multiple R of .62 
is only slightly greater than the zero-ortler predictic»\ by OSWE of .60. 

Variable Set 2 included sHX ocnputer-generated scores, and ten of tl^se 
contributed to the shrunken mltiple R of .74. Ihe word count, the count of 
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usage flags, and the count of fuzzy word flags contrilsuted roost to the 
prediction. Again, the positive beta wei^ts for t)» usage and passive verb 
flags are interesting. V&riable Set 2 shows, as did Pan^ {1968) many years 
ago, that a ooHixiter can predict m^sonably well the scores aligned human 
readers of essays, it is of inters to chserve also that the mltiple R of 
.74 csbtained frm oonputer analysis is larger than the zero-ortier r of .60 
cAitained from a inanual count of errors in the same essays two different 
readers. 

variable Set 3 in Table 2 txntoined xnultiple-dtoioe tests and ocsi^xjter- 
generated soaces, and seven of these variables contributed to the prediction. 
The shrunken Miltiple of ,78 is only slic^tly larger than that of .74 
cAstained using the ocirputer scores alone. 
Predictincr writing Ability More Generally . 

We new turn to a samevSiat dif feprait type of analysis in v^ich we 
predict, not the score «i a single essay, hut the ccrabined scores on five 
different essays excludijq the essay analyzed by Word^gVP ( IM) . In other 
words, we will show that we can predict a stude*it's ability to write more 
generally using only miltiple-choioe scores a-.i a ccai?3Uter analysis of a 
single 45-ininute essay. Ihe five-essay critericai is based on a combination 
of narrative, ea^xasitory, and persuasive writing written both in class under 
tiined oonditioTS as well as outside of claas withcait timing following class 
discussim of the essay assignments. In other \*arcte, the five-^essay 
criterion is a pretty good measure of each students writing ability, with an 
alpha reliability of .S8, the five-essay critericai correlates well with a 
number of variables. 

Table 3 sJicws sinple oorrelatioras between the five-essay critericai and 



available variables. Ihe laest csannalate of tte five-essay criteiicsi obtained 
was for tl^ imz (-72), followed by the ECT (.70), Error Rate (.62) , SftIV 
(.58), Fl^ Scxare (.47), Maries Soore (.44), WtanMVP CJcarposite (.42), and 
Viord count (.40) . oncae as^in, r^ate that Sft3>-M is also a good p»:«Jictor of 
writing ability (.39) , alnwst as good as seme of the other variable. But it 
is iinportant to aiphasize that Error Rate, Flags Score, Marks Score, WonJMAP 
Otstposite, and Word Count are based m only a siiwle essay \Aiich was not one 
of the five esrays inclix^ in the five-essay criterion. A^an, it is of 
interest to note the positive oorrelaticme generated by style and \asage 
fla^. In otto words, style (^ich is oonoemed with split infinitives, the 
use of passive and *»to be" veadbs, the use of first-persrai referen-^ like "I" 
or "me," and starting sentencaes with "and" or "but," and the like) appears 
mt to be considered especially iraportant by rea^is of college freshmen 
English papers. Otje same appears to be true of usage (\*dd[i is conce led 
with the use of cliche's, vague, wmk, or fuzzy words, slang, and 
oolloguialisras, for exainple) - 

Table 4 shews that good predictions of writing ability can be made 
without the i:^ of human readers. Tfm mltiple-cJraice scores of Variable Set 
1 yielded a shrunken multiple R of .73, and the ocsnputer-generated vpric^les 
of Variable Set 2 yieliM a shninken mltiple of .66. When both nultiple- 
choioe scores and ooiprt^r-generated sc»res are carbined in Variable Set 3, 
the shrunken mltiple increases to .82. 

Tne vise of variables like woid count and paragraph count raay be viev^ 
by scsne as "sy^^cally invalid" bfxause feedback of this type to the 
writers of the essays would not necessarily isprove their writing ability 
(Frederiksen & Collins (1989) . Others view such variables as "corruptible" 
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because knowledge of thm could result in fakiiig by examinees. But a c2ase 
can be itiaae for a count of wards Vioritten csi a tined test sincje it is a good 
measuine of vertsal fluenc^^^-an ability not ofteai measured by other tests 
(Sincoff & Sternberg, 1987) . 

In "Table 5 introduce the human reader as a predictor of writing 
ability. As we hava noted previously, the ^sssy #1 that we have analysed by 
ooai^juter was also read and scored holistically ty three diffenant readers. 
Ihe sum of their 'scores was the d^jendent variable in -Tables 1 and 2 . Now we 
wish to (tetermina how well these reader scores predict tl« five-essay 
criterion, v4iicn exclirfes the Essay #1 soore. Variable Set 1 in l^le 5 
shews that the inuitiple oorrelatic^i of these three reader scores prGdictc«d 
the five-tissay criterion c^ite (R .74) , but not nearly as well as the 
combination o? miltiple-^oioe scores and o^^ter analysis (of the saine 
essay) sho^n in Table 4 (R « ,82) . 

Variable Set 2 in Table 5 adds the Error Rate, and thus two raore human 
readers, to the predictic»i. The raultiple R of .76 shiows that efven five human 
readers cf a single essay do not do as well at predicting writing ability ati 
did the combination of imiltiple-choice and confauter scores in Table 4. 

Variable Set 3 siinulates a ccumcsi type of writing assessment in v^ich 
two reade r scores are ccaribined with one roultiple-choice test score. Ihe 
seocaid and third readers were chosen because their combined perforroance ras 
better than other reader ocaibinations. Ihe multiple correlation detained 
with 'triable Set 3 {R = .80) is ocsrparable to that cbtained using jnultiple- 
choice scores in conibinaticffi with ocaiixiter analysis in Table 4 (R = .82) . 
R^lacing the TSWE with the ECT in Variable Set 3 makes no significant 
difference in the analysis. 
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Variable Set 4 cxsiftjijies the t^ human readeins with the ocaiputex- 
generated scxsres. the shm^cen imiltiple R of .77 is aluDSt as hi^ as that 
ototainei in the siimilated assessment of Variable Set 3, and it avoids the use 
of inultiple-choioe tests. 

Variable Set 5 in Table 5 uses all available variables to pinedict 
writing ability and shows that only a single reading of the essay is 
neoe^ary when laultiple-ctioice test scores and oair|xiter analysis are cxstibinod 
with human readings. Incdusicai of the ThinJ Reader in Variable Set 6 did not 
increase the aailtiple oorrelation beyond the .85 value obtainable with only 
one reading. Note that SA'M! and the paragrapli count are si;?>r^sor 
variables. 

Conclusions 

Ihese res^ults are iitportant because they show that assessments of 
writing skill can be erOianoBd throt*-^ the use of text analysis software. 
Althou^ it is not likely that wmy institutions will choose to attempt such 
assessments without human reatters, it will clearly be possible to jnDduce the 
amount of labor required— -perhaps by using only raie reading rather than two 
or three as is at times the custom. 

Equally inportaot is that ccuputer analysis of student essays can 
provide a level of detail in feedback to students, teachers, and others that 
is not possit-le usinj human readers alone. "Biis kind of feecSback has 
inportant utplications for instructiwi in English ccsiposition. Moreover, 
ccasputer analysis can provide detailed feedback ai many writtfin products, 
even lengthy cjies; a teacher of English will nonnally provide detailed 
feedback on only a few brief essays. 

o 1 .. 
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Finally, the analysis of fjctee-n^pcsrses in essay fom as a moans of 
assessij^ writing skill would appeat* to be a pnsnising alternative to 
wultiple-ciioioe tests, v!hic*i are vie red by scsne as having negative 
consequencses for instmction — especially in ccanpositicsi instructim. 
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I^le 1. OcjirrelaticaTs Between Pj^edictcr Variables 
and Essay #1 Scxxb 
(N ^ 92) 



I>ci©dictor Variable Oorrelaticai With 

Holistic Rating* 



Miiltiple-<hoice Secures 




•ESWE 


.60 


ECT 


.56 


SAT-V 


• ^4 




.36 


Reader Scotb 
Exxar Rate 




- -60 


Selected Wosr^ftP Variables 




Word Count 


.50 


Passive Verb Flags 


.15 


Oto Be Verb Flags 


.00 


Subject/Verb Flags 


- .26 


Rizzy Word Flags 


- .07 


Run-on Ssiteaice Flags 


- .16 


Dangler Flags 


- .29 


Spelling Flags 


- .25 


C^italizaticn Fl^s 


- .04 


Punctuation Flags 


.00 


Worttli^ CJomposite Scores 




Wl Ctnixssite 


.46 


Mirks Score 


.48 


Flags Score 


,47 


Graramar Fli^ 


- .35 


Style Flags 


.10 


Sfcn^^ure Flags 


- .35 


Us^e Flags 


.04 



*Ihe sum of three ratings made independently by three differe^it readers. 

T!^ sum of error counts made independently by two different readers divided 
by the number of words written. 



■Table 2. Multiple Regression Predictions of 
Essay #1 Score 
(N ^ 92) 



D^sendent Predictor 
Vari^le Variables 



Essay #1 Variable Set 1 

TSWE 
SAT-V 

Varia b le Set 2 
Vtord Ocunt 

(XmpositB 
FIb^ Score 
Marks Score 
Structure Flags 
Vsskge Flags 
Graaraaar fl^s 
Passive Verb Flags 
Dangler Flags 
Fuzzy l*5rd Fl^ 

Variable Set 3 

1SWE 

SAT-V 

ECT 

Ward cxxart 
M OaEi|X3sita 
Flags Score 
Maries Score 



P-Vali:^ beta R* 



,00 .44 .63 (.62) 

.04 .23 

.01 .29 .77 (.74) 

.02 .23 

.07 .19 

.19 .15 

.22 -.10 

,02 .28 

.10 -.14 

.02 .19 

.14 -.12 

.02 -.26 

.03 .34 .80 (.78) 

.02 .27 

,22 -.19 

.00 .29 

.11 .12 

.01 .24 

.14 .16 



*Figures in parentheses adjusted for the nuntier of predictor variables. 
The sum of three holistic ratings of Essay 1. 



■Eable 3. Oorrelaticais Between Pnedictar Variables and 
Wtiting Ability (N ^ 92) 



Predictor Variables Oomeaation With 

Five Essay Scjoib Sum* 



Scores can MultipleKhoioe Tests 

TSWE ,72 

E3CT .70 

SAT-V .58 

SfOHi .39 

Reader Socages 

Error Rate, Essay #1** -.62 

Essay #1 Score ,74 

Eissay #1, 1st Read^ Score .64 

Essay #1, 2nd Reader Score .68 

Essay #1, 3rd Reader Score .61 

Selected WordMRP Variables 

Word Count, Essay #1 .40 

Baracpraphs, " .03 

Passive Vierb Flags, Essay #1 .07 



Tto Be Vea±> Flags, 
Subject/Veito Flags, 
FUzzy Word Flags, 
RavCTi Sentence Flags, 
Daaigler Fl^^s, 
pelling Flags, 
lapitalizatioan Flags, 
Punctuaticai Flags, 



.05 

.27 
.04 
.11 
.24 
.33 
.11 
.01 



WordMftP Ccsi^xasite Scores 

m C3amposite, Essay #1 .42 

Maiics Score, ♦» .44 

Flags Score, " .47 
Gramex Flags, " - .25 

Styl<i Flags, " .20 

Us^e Flags, " .10 



^^2^ '^f ^ realer scores on 5 essays esxduding Essay #1. 
*l!ie sum of error counts for the Essay #1 maiB by two different re^3ers 
divided by the nuntaer of words written for this essay. 



Tuble 4. Multiple Regression Eredictica^ of Writing Ability 
Without Human Readers 

(N « 92) 



D^)enc)ent 
Variable 



Predictor 
variables 



Predictor 
Significsance 
fP-^l\ie) 



beta Multiple 



Holistic Sum 



V&riable Set 1 






OSWE 


.00 


.46 


BCT 


.04 


.03 


^^iable Set 2 






Var^ Count 


.01 


.03 


Baragx::^ count 


.01 


- .24 


Ocnixxsite 


.10 


.18 


Flags Score 


.00 


.33 


Marks Score 


.17 


.02 


Usage Flags 


.05 


.18 


Dsuigler Flags 


.18 


- .12 


Variable Set 3 






T5WE 


.00 


.45 


SA3^V 


.01 


.02 


SflT-M 


.11 


- .01 


Wcacd Oaunt 


.01 


.02 


Par(^pn^ph Oxart 


.00 




F1^5 Score 


.00 


-02 


Marks Score 


.05 


.02 


Us£^ Flags 


.01 


,16 



.74 (.73) 



.69 (.66) 



.84 (.82) 



^*Figures in parentheses justed for the nuniber of predictor variables. 
'The sum of five esssay scores excluding Essay Score 1. 



1 



■Table 5. Multiple Itegressicsi Predicticsis of 
Waiting Ability With Hunan Iteacfers 
(N = 92) 



D^peaictent 
Variable 



Predictor 
Vfeuriables 



P-Value 



beta 



Holistic sm** 









1st BaacJeTi Essay #1 


.00 


.29 




.00 


.40 


3ni Reader, " 


.09 


.17 


Vfeoriable Set 2 






1st Reader, Essay #1 


.09 


.17 


2nd Reacter, »' 


.00 


.33 


3xd Raaffiter, " 


.08 


.15 


Errta: Bate, " 


.00 


~ .27 


V&riable Set 3 






2nd Reafer, Essay #1 


.00 


.27 


3rd Readier, " 


.01 


.23 


TSWE 


.00 


.46 


Variable Set 4 






2nd Raaater, Essay #1 


.00 


.41 


3rd Reader, " 


,18 


.13 


WoEcd count, " 


.14 


.02 


I^ragraphs, " 


.01 


- .20 


Vfff CdoQposite, " 


.22 


.01 


Fferks Sacxce, " 


.26 


.01 


Flags Score, " 


.02 


.02 


Usa^a Flags, " 


.07 


.14 


V&riable set 5 






2nd Reader, Essay #1 


.00 


.27 




.06 


.21 


SMV 


.08 


.12 


SflOM 


.12 


- .01 


ECT 


.19 


.17 


Vfcard Cbunt, Essay #1 


.03 


.17 


Varagraphs, " 


.00 


- .20 


m. Oonoposite, " 


.30 


.10 


Fla^ Score, " 


.12 


.11 


Marfcs Score, " 


.27 


-11 


Usage Flags, " 


.01 


.19 



.75 (.74) 



.78 (.76) 



.81 (.80) 



.79 (.77) 



.87 (.85) 



Figures in parentteses adjusted for the minber of predictor variables. 
The sum of 15 holistic scares on 5 different ess^, eaariUding Essay #1. 



I:/ 



